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In this paper, we present a method to project co-authorship networks, that accounts in detail for 
the geometrical structure of scientists collaborations. By restricting the scope to 3-body interactions, 
we focus on the number of triangles in the system, and show the importance of multi-scientists (more 
than 2) collaborations in the social network. This motivates the introduction of generalized networks, 
where basic connections are not binary, but involve arbitrary number of components. We focus on 
the 3-body case, and study numerically the percolation transition. 
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I. INTRODUCTION 



It is well-known in statistical physics that N-body cor- 
relations have to be carefully described in order to char- 
acterize statistical properties of complex systems. For in- 
stance, in the case of the Liouville equation for Hamilto- 
nian dynamics, this problem is at the heart of the deriva- 
tion of the reduced BBGKY hierarchy, thereby leading 
to the Boltzmann and Enskog theories for fluids P|. In 
this line of though, it is primordial to discriminate N- 
body correlations that are due to intrinsic N-body in- 
teractions, from those that merely develop from lower 
order interactions. This issue is directly related to a well- 
known problem in complex network theory, i.e. the "pro- 
jection" of bipartite networks onto simplified structures. 
As a paradigm for such systems, people usually consider 
co-authorship networks namely networks composed 
by two kinds of nodes, e.g. the scientists and the arti- 
cles, with links running between scientists and the papers 
they wrote. In that case, the usual projection method j^] 
consists in focusing e.g. on the scientist nodes, and in 
drawing a link between them if they co-authored a com- 
mon paper (see FigQ. As a result, the projected system 
is a unipartite network of scientists, that characterizes 
the community structure of science collaborations. Such 
studies have been very active recently, due to their com- 
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FIG. 1: Usual projection method of the bipartite graph on a 
unipartite scientists graph. 
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plex social structure , to the ubiquity of such bipartite 
networks in complex systems ^^^^ to the large 

databases available. 

A standard quantity of interest in order to character- 
ize the structure of the projected network is the clus- 
tering coefficient which measures network "transi- 
tivity", namely the probability that two scientist's co- 
authors have themselves coauthored a paper. In topo- 
logical terms, it is a measure of the density of triangles 
in a network, a triangle being formed every time two of 
one's collaborators collaborate with each other. This co- 
efficient is usually very high in systems where sociological 
cliques develop However, part of the clustering in co- 
authorship network is due to papers with three or more 
coauthors. Such papers introduce trivial triangles of col- 
laborating authors, thereby increasing the clustering co- 
efficient. This problem, that was raised by Newman et al. 
1^, was circumvented by studying directly the bipartite 
network, in order to infer the authors community struc- 
ture. Newman et al. showed on some examples that these 
high order interactions may account for one half of the 
clustering coefficient. One should note, however, that if 
this approach offers a well-defined theoretical framework 
for bipartite networks, it suffers a lack of transparency as 
compared to the original projection method, i.e. it does 
not allow a clear visualisation of the unipartite structure. 

In this article, we propose an alternative approach that 
is based on a more refine unipartite projection, and fol- 
lows Statistical Mechanics usual expansion methods. To 
do so, we focus on a small dataset, retrieved from the 
arXiv database and composed of articles dedicated to 
complex network theory. This choice is motivated by 
their relatively few co-authors per article, a property 
typical to theoretical physics papers ^|. Our method 
consists in discriminating the different kinds of scientists 
collaborations, based upon the number of co-authors per 
article. This discrimination leads to a diagram repre- 
sentation of co-authorship (see also 6] for the ap- 
plicability of Feynman diagrams in complex networks). 
The resulting N-body projection reconciles the visual fea- 
tures of the usual projection, and the exact description of 
Newman's theoretical approach. Empirical results con- 
firm the importance of high order collaborations in the 
network structure. Therefore, we introduce in the last 
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FIG. 2: Histogram of the number of scientists/articles, n. 
The dashed hne corresponds to the fit e~~ . 



FIG. 3: Graphical representation of the 4 most basic authors 
interactions, namely 1, 2, 3, 4 co-authorships. 



section a simple network model, that is based on random 
triangular connections between the nodes. We study nu- 
merically percolation for the model. 

II. N-BODY PROJECTION METHOD 

The data set contains all articles from arXiv in the time 
interval [1995 : 2005], that contain the word "network" in 
their abstract and are classified as "cond-mat". In order 
to discriminate the authors and avoid spurious data, we 
checkeed the names and the first names of the authors. 
Moreover, in order to avoid multiple ways for an author 
to cosign a paper, we also took into account the initial 
notation of the prenames. For instance. Marcel Ausloos 
and M. Ausloos are the same person, while Marcel Aus- 
loos and Mike Ausloos are considered to be different. Let 
us stress that this method may lead to ambiguities if an 
initial refers to two different first names, e.g. M. Ausloos 
might be Marcel or Mike Ausloos. Nonetheless, we have 
verified that this case occurs only once in the data set 
(Hawoong, Hyeong-Chai and H. Jeong), so that its effects 
are negligible. In that sole case, we attributed the papers 
of H. Jeong to the most prolific author (Hawoong Jeong 
in the dataset). Given this identification method, we find 
np = 2533 persons and ua = 1611 articles. The distribu- 
tion of the number of co-authors per article (Fig|21) shows 
clearly a rapid exponential decrease, associated to a clear 
predominance of small collaborations, as expected. 

Formally, the bipartite structure authors-papers may 
be mapped exactly on the vector of matrices A4 defined 
by: 

M = [M(i),M(2),...,m(^\....,M("^)] (1) 

where M^-'^ is a square rip matrix that accounts for all 
articles co-authored by j scientists. By definition, the 
element Ma^...aj are equal to the number of collabora- 
tions between the j authors ai...aj. In the following. 



we assume that co-authorship is not a directed relation, 
thereby neglecting the position of the authors in the col- 
laboration, e.g. whether or not the author is the first 
author. This implies that the matrices are symmetric 
under permutations of indices. Moreover, as people can 
not collaborate with themselves, the diagonal elements 
Maa,,,a vanish by construction. For example, M^j-* and 

(2) 

Maia2 represent respectively the total number of papers 
written by oi alone, and the total number of papers writ- 
ten by the pair (ai, 02). 

A way to visualize M consists in a network whose 
nodes are the scientists, and whose links are discrimi- 
nated by their shape. The intrinsic co-authorship inter- 
actions form loops (order 1), lines (order 2), triangles 
(order 3) (see Fig|3Jl... To represent the intensity of the 
multiplet interaction, the width of the lines is taken to 
be proportional to the number of collaborations of this 
multiplet. Altogether, these rules lead to a graphical rep- 
resentation of M , that is much more refine than the usual 
projection method (FigQ]). 

It is important to point out that the vector of ma- 
trices A4 describes without approximation the bipartite 
network, and that it reminds the Liouville distribution 
in phase space of a Hamiltonian system. Accordingly, a 
relevant macroscopic description of the system relies on 
a coarse-grained reduction of its internal variables. The 
simplest reduced matrix is the one-scientist matrix R'^' 
that is obtained by summing over the N-body connec- 
tions, N >2: 

+ E-- E M^..a,+... (2) 

It is straightforward to show that the elements R^a} de- 
note the total number of articles written by the scientist 



FIG. 4: Graphical representation of the co-authorship net- 
work. This small sub-network accounts for 1 two-authors col- 
laboration, (Timme, Ashwin); 4 three-authors collaborations, 
3 times (Timme, Wolf, Geisel) and (Geisel, Hufnagel, Brock- 
mann)); 1 four-authors collaboration (Timme, Wolf, Geisel, 
Zumdieck). Because the triplet (Timme, Wolf, Geisel) col- 
laborates three times, its links are three times larger than the 
other links. 

flj . The second order matrix: 

''0102 0102 ' / ^ ^'^Oi...03 ' 

03 

+E-- E ^e.o, + - (3) 

03 Oj<aj_i 

Its elements represent the total number of articles written 
by the pair of scientists (ai, 02). Remarkably, this ma- 
trix reproduces the usual projection method (Fig. and 
obviously simplifies the structure of the bipartite struc- 
ture by hiding the effect of high order connections. The 
three-scientist matrix read similarly: 

i?(3) ^ Af (3) I V M^^^ + 
a4 

+E-- E ^^..o, + - (4) 

04 Oj<Oj_i 

This new matrix counts the number of papers co-written 
by the triplet (oi, 02, 03), and may be represented by 
a network whose links are triangles relating three au- 
thors. The generalization to higher order matrices R^-'-' 
is straightforward, but, as in the case of the BBGKY hi- 
erarchy, a truncature of the vector A4 must be fixed at 
some level in order to describe usefully and compactly 
the system. It is therefore important to point that the 
knowledge of M^^^ together with R'^) is completely suf- 
ficient in order to characterize the triangular structure of 




FIG. 5: 3-body projection of the bipartite network. For 
the sake of clarity, we focus on a small sub-cluster, centered 
around the collaborations of M. Newman. The upper figure 
is the usual projection method • The lower figure is the 
triangular projection iQJ of the same bipartite network. 

M. Consequently, in this paper, we stop the reduction 
procedure at the 3-body level, and define the triangular 
projection of M by the application: 

Im'-V m^^) m(^) m^"^) 1 

[-"^-'ol ' -"^-'0102 ' -"^-'010203 ; — I -"^-'oi...o„j,J 

^[Mi'^,Mj,%,R(^^lj (5) 
The triangular projection is depicted in Fig. |31 and com- 
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FIG. 7: Proportion of nodes in the main island, as a function 
of the number of links/node, in the ERN and the ERN^ model. 




FIG. 6: Percolation transition in the ERN'^ model with 50 
nodes, from a dilute phase with small disconnected islands 
(8 triangles) to a percolated phase with one giant cluster (20 
triangles) . 



pared to the usual projection method. In order to test 
the relevance of this description, we have measured in the 
data set the total number of triangles generated by edges. 
We discriminate two kinds of triangles: those which arise 
from one 3-body interaction of R*^'^\ and those which 
arise only from an interplay of different interactions. 
There are respectively 5550 and 30 such triangles, namely 
99.5% of triangles are of the first kind. This observation 
by itself therefore justifies the detailed projection method 
introduced in this section, and shows the importance of 
co-authorship links geometry in the characterization of 
network structures, precisely the clustering coefficient in 
the present case. 



III. TRIANGULAR ERDOS-RENYI 
NETWORKS 

The empirical results of the previous section have 
shown the significance of N-body connections in social 
networks. A more complete framework for networks is 
therefore required in order to describe correctly the sys- 



tem complexity. In this article, we focus on the most 
simple generalization, namely a network whose links re- 
late triplets of nodes. To so, we base our modeling on 
the Erdos-Renyi uncorrelated random graph JL,3J, i.e. the 
usual prototype to be compared with more complex ran- 
dom graphs. The usual Erdos-Renyi network (ERN) is 

(2) 

composed by iV„ labeled nodes connected by Ne edges, 
which are chosen randomly from the iV„(iV„ — l)/2 pos- 
sible edges. In this paper, we define the triangular ER 
network (ERN"^) to be composed by iV„ labeled nodes, 

(3) . 

connected by Ne triangles, which are chosen randomly 
from the Nn{Nn — l)(iV„ — 2)/6 possible triangles. As a 
result, connections in the system relate triplets of nodes 
(ai, 0,2, a^), and the matrix vector A4 reduces to the ma- 
trix M(3). Before going further, let us point that the 
clustering coefficient of triangular ER networks is very 
high by construction, but, contrary to intuition, it is dif- 
ferent from 1 in general. For instance, for the two triplets 
(ai, a2, as) and (oi, a^, as), the local clustering coefficient 
of ai is equal to |. 

In this paper, we focus numerically on the percola- 
tion transition in ERN^, i.e. on the appearance of 
a giant component by increasing the number of nodes in 
the system (FigEJ. This transition is usually associated 
to dramatic changes in the topological structure, that 
are crucial to ensure communicability between network 
nodes, e.g. the spreading of scientific knowledge in the 
case under study. In the following, we work at fixed num- 
ber of nodes, and focus on the proportion of nodes in the 
main cluster as a function of the number of binary links 
in the system. Moreover, in order to compare results with 
the usual ERN, we do not count twice redundant links, 
i.e. couples of authors who interact in different triplets. 
For instance, the triplet (ai, 02, 03) accounts for 3 binary 

links, but (ai, 02, as) and (ai, a2, 04) account together for 
(3) (2) 

5 links, so that A''e ^ 3A''e in general. Whatever, this 
detailed counting has small effects on the location of the 
percolation transition. Numerical results are depicted in 
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figure [7| where we consider networks with Nn = 1000. 
Obviously, the triangular structure of interactions dis- 
places the bifurcation point, by requiring more links in 
order to observe the percolation transition. This feature 
comes from the triangular structure of connections that 
restrains the network exploration as compared to random 
structures. Indeed, 3 links relate only 3 nodes in ERN'^, 
while 3 links typically relate 4 nodes in ERN. Finally, let 
us stress that the same mechanism takes place in systems 
with high clustering coefficients 0, 0| . 

IV. CONCLUSION 

In this paper, we show the importance of N-body inter- 
actions in co-authorships networks. By focusing on data 
sets extracted from the arXiv database, we introduce a 
way to project bipartite networks onto unipartite net- 
works. This approach generalizes usual projection meth- 
ods by accounting for the complex geometrical figures 
connecting authors. To do so, we present a simple the- 
oretical framework, and define N-body reduced and pro- 



jected networks. The graphical representation of these 
simplified networks rests on a "shape-based" discrimi- 
nation of the different co-authorship interactions (for a 
"color-based" version, see the author's website fl^), and 
allows a clear visualization of the different mechanisms 
occurring in the system. Finally, we apply the method to 
some arXiv data subset, thereby showing the importance 
of such " high order corrections" in order to characterize 
the community structure of scientists. The empirical re- 
sults motivate therefore a better study of networks with 
complex weighted geometrical links. In the last section, 
we focus on the simplest case by introducing a triangular 
random model, ERN^. Moreover, we restrict the scope 
by analyzing the effect of the 3-body connection on per- 
colation. A complete study of the topological of ERN^ 
as well as its generalization to higher order connections 
is let for a forthcoming work. 
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